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^SJ , Abstract 

We consider a model of fixed size A^ = 2' in which there are I generations of daughter cells 

P^ I and a stem cell. In each generation i there are 2'^~^ daughter cells. At each integral time 

P^ ■ unit the cells split so that the stem cell splits into a stem cell and generation 1 daughter cell 

(-H I and the generation i daughter cells become two cells of generation i + 1. The last generation 

is removed from the population. The stem cell gets first and second mutations at rates ui 

and U2 and the daughter cells get first and second mutations at rates vi and V2. We find the 

distribution for the time it takes to get two mutations as N goes to infinity and the mutation 

rates go to 0. We also find the distribution for the location of the mutations. Several outcomes 

0\1 ■ are possible depending on how fast the rates go to 0. The model considered has been proposed 

K^ I by Komarova (2007) as a model for colon cancer. 
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^ : 1 Introduction 

In the 1950's Armitage and Doll [Ij proposed that cancer may be the end result of an accumulation 
of two or more cell mutations. Komarova [5j discusses three mathematical models which may be 
used to model the mutations that lead to cancer. The first is the Moran model, which may be 
used to model cancers in liquids such as Leukemia. In this model there is a fixed population of 
C^ I size N . There is a rate // at which cells are getting mutations. Each cell in the population dies at 

rate 1 and is replaced by any individual in the population, including itself, with equal probability. 
The second is a spatial model which may be used to model cancers in solid tissues. This model 
is similar to the Moran model except that the cells are given spatial locations and when they die 
they are only replaced by nearby cells. The third model, the one we focus on in this paper, is 
referred to as the hierarchical model in [5]. This model may be used for colon cancer. 

As discussed in [5], many cells in the human body, including those in the colon, go through 
a three step process. It begins with a stem cell which will stay in the population for a long time 
and have many descendants. Some of these descendants will also be stem cells, but others will be 
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differentiated progenitor cells. The progenitor cells, or what we shall refer to as daughter cells in 
this paper, will split into more daughter cells. The number of times these cells split is dependent 
upon what organ of the body they are in. We will refer to the number of splits that a daughter 
cell has undergone as the generation of the cell. Once the cells split enough times they reach 
maturity and are swept out of the population in a biological process called apoptosis. 

The colon is lined with crypts that contain pockets of cells. The cells in the colon, as described 
by Komarova in [7], are such that stem cells reside at the bottom of the crypt and the daughters 
migrate up the crypt so that the higher generation daughter cells are near the top. We assume 
that cancer is the result of two mutations, as is done in |5]. There are three ways in which the 
mutations may occur. The stem cell may acquire both mutations so that cancer is a result of 
mutations of the stem cell only. It is possible that stem cell receives the first mutation and a 
daughter cell gets the second, or a daughter cell and one of its descendants will each receive 
mutations before they are swept from the crypt. In [5] these cases are abbreviated as ss, sd and 
dd respectively. 

The Hierarchical model shall be referred to as Hi. This model has a fixed population of size 
N = 2 where / is the number of generations of daughter cells in the crypt. There is one stem 
cell and for k G {1,2, ...,/} there are 2^~^ daughter cells of generation k. We start with a full 
crypt and no mutations. At each integral time unit all of the cells split in the following way: 

• The stem cell splits into a stem cell and a generation 1 daughter cell. 

• For each generation k with 1 < A; < / — 1, a daughter cell of generation k will split into two 
cells of generation k + 1. 

• The daughter cells of generation / undergo apoptosis and are swept from the population. 

Notice that the generations are constant size throughout time. The cells will accumulate muta- 
tions via Poisson processes. A cell with 0, 1 or 2 mutations is called a type-0, type-1 or type-2 
cell respectively. A mutation which occurs on a type-0 or type-1 cell is called a type-1 or type-2 
mutation respectively. This terminology is used so that a mutation that makes a cell type-2 is 
called a type-2 mutation. Once a type-2 mutation occurs the colon is assumed to have cancer. 
The cells will each have two Poisson processes marking them, one which will cause type-1 muta- 
tions and one which will cause type-2 mutations. The first Poisson process that marks a cell will 
only cause a type-1 mutation if the cell is a type-0. If a mark of the Poisson process occurs while 
the cell is not a type-0 then the mutation is rejected. Likewise, the second Poisson process only 
causes mutations on type-1 cells. If a mark from this Poisson process occurs on a cell while it is 
type-1 then the cell becomes type-2, but if the cell is not a type-1 then nothing happens. All of 
the Poisson processes are independent. The mutations are passed to the descendants when a cell 
splits. It is sometimes convenient to think of the cells as fixed in a binary tree and the mutations 
as traveling through the tree in a direction which takes them from the root to the leaves. Because 
of this we will often refer to the sequence of stem cells as the stem cell and we fix the Poisson 
processes that are marking the cells on particular locations in the tree. 

The rates at which the stem cell acquires type-1 and type-2 mutations are ui and U2 respec- 
tively. The rates at which the daughter cells get type-1 and type-2 mutations are vi and V2 
respectively. Each of the rates are functions of A^ and will approach as A^ approaches infinity. 
We will always consider what happens as A^ goes to infinity. All limits will be assumed as taking 
N to infinity unless otherwise stated. 



We let T^Ai) be the first time that any cell gets a type-2 mutation where Ai refers to a 
model. We call a type-1 mutation to a cell which has a type-2 descendant successful. A type-1 
mutation to a stem cell is always successful and a type-1 mutation to a daughter is successful 
if the daughter has a type-2 descendent before its progeny is washed from the population. We 
will call the successful type-1 mutation whose type-2 descendant is the first type-2 to occur the 
cancer causing type-1 mutation. Being the cancer causing type-1 mutation is not equivalent to 
being the first successful type-1 mutation. We also define random variables o"(ylj) and p{Ai) to 
be the depth of the colon at which the cancer causing type-1 and first type-2 mutations occur, 
respectively. More precisely, if the cancer causing type-1 mutation occurs in generation j then 
we define cr{Ai) = j/l and if the first type-2 mutation occurs in generation k then we define 
p{Ai) = k/l. If the cancer causing type-1 mutation or first type-2 mutation occur on the stem 
cell then a[Ai) = or p[Ai) = respectively. 

The above establishes most of the notation that will be used throughout this paper, but some 
more will be included here. For any real number a we define a^ = a V 0. For functions f{x) and 
g[x) we will denote the limits f{x)/g{x) — )• 0, f{x)/g{x) — )• 1, and f{x)/g{x) — )• oo as x — )• oo by 
f '^ 9^ f ^ 9 s-iicl f ^ g respectively. We will also assume that there always exists a constant 
a > such that when e > we have 

A^-"-' < W2 < 7V-"+^ (1) 

If a = then the mutation rates are too fast to be realistic. To reduce the number of subscripts, 
we will use logx for log2X. We will use — )-d to denote convergence in distribution and -^p to 
denote convergence in probability. 

One of the two goals of this paper is to find the asymptotic distribution of t{Hi) as A^ 
approaches infinity. Similar work has been done for the Moran model by Schweinsberg in [9] 
and Durrett, Schmidt and Schweinsberg in [3] in which more general results have already been 
found. In [5], Komarova makes a connection between the Moran model and the hierarchical one as 
follows: In the Moran model a mutation may undergo fixation, meaning it spreads throughout the 
entire population through the birth-death process and all of the cells are the same type. Because 
the last generation is always removed in the hierarchical model, the only way to get fixation is 
if the stem cell gets mutated. These are the cases ss and sd. In these cases the mutation will 
spread throughout the population in I time units. In the Moran model it is also possible that 
the mutations undergo what is called stochastic tunneling. This is when multiple mutations are 
acquired before they fixate. This is analogous to daughter cells acquiring two mutations before 
the stem cell mutates in the hierarchical model. This is the dd case and can also happen in the 
sd case if the second mutation occurs before the first has time to fixate (which is expected to 
happen when a < 1). The rate at which daughter cells get successful type-1 mutations is given 
heuristically in [5j to be 

One may arrive at this rate by noting that the i generation has 2*"^ cells which get type-1 
mutations at rate vi . Each of the cells will have 2^~'^~^^ — 2 descendants which live for one time 
each and get type-2 mutations at rate V2- The distribution of t{Hi) will be one part of the main 
theorem. 



Our second goal is to determine which cells obtain the mutations that lead to cancer. The 
location of the mutations can be essential to the treatment of cancer. As an example, studies of 
the effects of the drug imatinib on chronic myeloid leukemia have shown that leukemic stem cells 
will most likely not cause tumors but rather that a tumor is a result of a mutation on one of the 
daughter cells, see Dingli and Michor ^ and Michor ^. Imatinib treats leukemic daughter cells 
but not leukemic stem cells. So while using imatinib problems arising from cancer are prevented. 
However, patients cannot stop treatment because the leukemic stem cells will continue producing 
new leukemic daughter cells. Therefore, the location of where the mutations occur may play a 
pivotal role in determining how to treat the cancer. This is the other part of the main theorem 
in which we determine the limiting distributions of cr{Hi) and p{Hi). 

According to Komarova in [6] there are four cases that are particularly interesting from a 
biological viewpoint. 

1. The null- model. In this model all of the mutation rates are equal. This model is the easiest 
to work with: ui = U2 = vi = V2. 

2. Chromosomal instability. In this case the probability of getting a second mutation is greater 
than that of getting the first mutation, but the rates do not differ between stem cells or 
daughter cells: ui = vi < U2 = V2- 

3. Stem cells have a lower mutation rate: vi = V2 > ui = U2- 

4. The problem of de-differentiation. In this case the daughter cells have a slower mutation 
rate. There are two scenarios for this: vi = V2 < ui = U2 or vi = ui = U2 > V2- 

We impose the following restriction on the mutation rates: ui < U2 and vi < V2- This will cover 
all of the above scenarios except for the second of the de-differentiation cases. 

The following theorem is the goal of this paper. Recall that a is the number from ([1]). 

Theorem 1. Recall that all limits are taken as N goes to infinity. Let X be a random variable 
which has the exponential distribution with parameter 1 and let Y be a random variable which 
has the Rayleigh distribution so that P{Y <t) = l — e~* '^ for any t > 0. 

1. If viV2 <C l/iV(logiV)2 and viV2NlogN > m then (a A l)viV2N{logN)T{Hi) -^d X. 
The distribution of a {Hi) converges to the uniform distribution on ((1 — a)^ ., 1] and p{Hi) 
converges in probability to 1. 

2. If l/N{\ogNf < wiW2 < 1/iV and ^/viV2N > m then y/viV2NT{Hi) -^^ Y. Both a{Hi) 
and p{IIi) converge in probability to 1. 



3. If f 1^2 ^ ^/N then \/viV2Nt(Hi) -^^ ^ ■ Both a{Hi) and p{Hi) converge in probability 
to 1. 

4- If we have the following two conditions: 

• Either wiW2 < l/N{logN)'^ and ui > viV2NlogN or l/N{logNy < wit;2 < 1/iV 
and ui S> \/viV2N 

• Both U2 < 1/logiV and U2 < Nv2 



then uit{Hi) — )-rf X . The probability that the first mutation occurs on the stem cell con- 
verges to 1 and p{Hi) converges in probability to a A 1. 

5. If we have the following two conditions: 

• Either wiW2 < l/N{logNf and ui > viV2NlogN or l/N{logNf < wiW2 <. l/N 
and ui » \/viV2N 

• Either n2 ^ 1/ log N or U2 ^ Nv2 

then ui <C U2 implies uit{Hi) -^^ X. On the other hand, if u\ ~ Au2 for some ^ > 1 
then let Z be an exponentially distributed random, variable with parameter 1/A which is 
independent of X . Then uit{Hi) — )-^ X + Z . In either case, the probability that both 
mutations occur on the stem cell converges to 1. 

The first three cases in Theorem [T] are what happens when the probability that the cancer 
causing type-1 mutation occurs on a daughter cell converges to 1. Case 4 gives the results for 
the sd regime and case 5 gives the results for the ss regime. 

The third case is a result of fast mutation rates. That is, there will be so many mutations 
that the probability of two mutations occurring before the model even has time to split once will 
converge to 1. This reduces to computing the waiting time for the first of A^ — 1 Poisson processes 
to receive two hits. 

In both the first and second cases the probability that t{Hi) goes to infinity converges to 
1. The results of these two cases rely on whether or not P{t{Hi) < log A^) converges to or 1. 
As for the first case, P{t{Hi) < logA^) — t- 0. The distribution of cr(Hi) arises from a balance 
between the large number of cells in the later generations versus the large number of descendants 
of cells in the earlier generations. 

For the second case P{t{Hi) < log A^) — )• 1. In this case the mutations occur fast enough that 
the number of descendants the cells have is not as important. This is why the cancer causing 
type-1 mutation will occur in the later generations. 

In both the first and second cases the second mutation will occur near the top of the crypt. 
This is because most of the cells are at the top of the crypt. The distribution of t{Hi) may be 
best understood through the following picture: 



{l-a)l 
0- 




h t2 

An Alternative View of the Model 

The horizontal axis is time and the vertical axis is cell generation. The circles represent 
cell mutations. The circles within the rectangle represent successful type-1 mutations and the 
other circles connected to these by a diagonal line which are located at the top of the graph 
represent their type-2 descendants. The type-2 mutations are at the top of the graph because 



the later generations are where we expect the type-2 mutations to occur. Likewise, the type-1 
mutations are expected to occur in the last al generations so they lie above the line marked at 
(1 — a)l. The infinite rectangle which is bounded between (1 — a)l and / vertically and only by 
on the left horizontally will be dotted within by successful type-1 mutations according to a 
Poisson process of rate V1V2N. This Poisson process has a uniform rate horizontally because of 
the time independence of the mutation rates and uniform rate vertically because of the balance 
between the number of cells in the later generations and the number of descendants of cells in 
the earlier generations. Notice that in the picture the cancer causing type-1 mutation is not the 
first successful type-1 mutation. 

At times ti and t2 there are diagonal lines coming out of the top of the graph that enclose 
a region of the rectangle. To have a type-2 cell by time ti we must have a successful type-1 
mutation in the corresponding triangle. Likewise, to have a type-2 cell by time t2 we must have 
a successful type-1 mutation in the corresponding quadrilateral. Therefore, the rate at which 
type-2 mutations occur is converging to the area enclosed in the graph by time t multiplied by 
the rate of successful type-1 mutations. For the first case we expect the time to get a second 
mutation to be much larger than / which is represented by ^2- Because of this, the waiting time 
as marked on the graph will go infinitely to the right as N goes to infinity and the quadrilateral 
will be approximately a rectangle since the missing bottom right corner will have negligible area. 
This will cause t{Hi) to have an exponential distribution. For the second case we expect the 
time to get a second mutation to be much smaller than I which is represented by ti. In this case 
the area enclosed in the graph will always be a triangle so that the rate at which we expect to get 
a type-2 mutation is asymptotic to a function of t^ . This results in convergence to the Rayleigh 
distribution. As N goes to infinity the triangle will be squeezed into the upper left corner. 

The convergence of (J {Hi) in part 1 is particularly interesting. The location of the cancer 
causing type-1 mutation is not immediately obvious because the generations with large numbers 
of cells have fewer descendants on which a type-2 mutations might occur. This result reveals 
how the high rates of type-1 mutations occurring on later generations balances with the high 
probability of success of type-1 mutations which occur on earlier generations. 

In all but one case p{Hi) converges to 1. This happens because of the large number of cells 
in the later generations. The exception in case 4 is caused by having a low v\ and high ui and 
V2- The stem cell will get the first mutation because the daughter cells are slow to acquire type-1 
mutations, but the daughter cells acquire type-2 mutations fast enough that they will get a type-2 
mutation before all of the daughters inherit the type-1 mutation from the stem cell. 

There are many boundary cases and most of them are not included in this paper, where we 
use the term boundary case to refer to the boundary between two of the conditions. That is, if 
vi <^ 1/N gives one result and vi S> 1/A^ gives another, we would consider vi ~ A/N for some 
constant yl to be a boundary case. If included, the boundary cases would make up the bulk of 
this paper. One reason for this is that our variables {vi,V2,ui,U2} span a four dimensional space 
so that the regions will have many boundaries. Moreover, sometimes three regions intersect in 
the same place. It does not seem that there would be any special difficulties in computing most 
of these boundary cases and that they could be done with the same methods used in this paper. 

The following proposition gives the results for the null-model, including results for the bound- 
ary cases. 

Proposition 2. Let fi = ui = U2 = vi = V2- Let X be a random variable which has the 
exponential distribution with parameter 1. Let Y he a random variable which has the Rayleigh 



distribution. 

1. If fi <^ 1/NlogN then ^t{Hi) -^^ X. The probability that the first mutation occurs on the 
stem cell converges to 1 and p{Hi) converges in probability to 1. 

2. If fi ^ A/N\ogN then (1 + A)^t(Hi) — j-^ X. Let ^ be a Bernoulli random variable such 
that P{S, = 1) = A/ {I + A) and P{S, = 0) = 1/(1 + A). Let U be a random variable with 
uniform distribution on [0, 1] . Then 

a{Hi) ^d Ui 

and 

p{Hi) ^di + {a M){l-i). 

3. If 1/iVlogiV < ^ < l/y/NlogN then (a A l)i?N{\ogN)T{Hi) ^d X. The distribu- 
tion of a {Hi) converges to a uniform distribution on ((1 — a) + , 1] and p{IIi) converges in 
distribution to 1. 



4. Ifpr^ A/VN log N then 

\im P{t{Hi)/ log N< t) = (1 - e~^'''/')^o,i/2]it) + (1 - e-^'*/2+'^'/«)l(i/2,oo)(i). 
Let Z be a random variable with density 

fix) = (l'\'e--^''''/'dt + 2e-^'A l[V2,i](x). 

As N goes to infinity cr{Hi) converges in distribution to Z and p(-ffi) converges in probability 
to 1. 



5. If l/VNlogN < /i < l/VN then VNfiT{Hi) -^d Y. Both a{Hi) and p{Hi) converge in 
distribution to 1. 



6. If p ^ A/y N then for each fixed time t > there exist constants c and C such that 
liminf P(r(iJi) < t) > c> and limsupP(T(i7i) < t) < C < 1. Both a{Hi) and p{Hi) 
converge in probability to 1. 



7. // l/vN ^ p then \/NpT{Hi) — t-^ Y . Both (t{Hi) and p(-ffi) converge in probability to 1. 

Parts 1, 3, 5 and 7 of Proposition [2] follow directly from Theorem [TJ Parts 2, 4 and 6, the 
boundary cases, will be done in the last section. In part 2 the cancer causing type-1 mutation 
may occur on the stem cell or a daughter cell. The event ^ = 1 indicates that the cancer causing 
type-1 mutation occurred on a daughter cell. In part 4 the picture which appears below Theorem 
[U is especially useful. We create a point process on [0, oo) x [0, 1] whose points are associated 
with the mutations. In Lemma [23] we show that the limiting distribution of this point process 
is a Poisson point process whose intensity is Lebesgue measure on [0,oo) x [1/2,1]. The main 
result of part 6 is that when p ~ A/yN the mutations will occur in finite time. Because of this, 
the discreteness of the model cannot be ignored and computing the limit as A^ goes to infinity 



becomes difficult. However, this is a degenerate case because the model no longer resembles a 
colon acquiring mutations. 

In the next section we include some known results in probability that will be used throughout 
the paper. In section 3 we introduce a new model which will be coupled with Hi. Theorem [1] will 
be proved with this new model in place of Hi and the coupling will give the results for Hi. The 
fourth section of this paper is devoted to getting results about the dd regime. The fifth section 
is on results about the sd and ss regimes. In section 6 we determine whether t{Hi), cr{Hi) and 
p{Hi) will satisfy the results of the dd, sd or ss regime. The proof of Theorem [1] is given at the 
end of section 6. The last section is a discussion of the boundary cases in the null model and a 
proof of Proposition [2l 

2 Preliminaries 

In this section we include some general results about probability which we will make use of in 
the paper. 

Lemma 3. // {Xn}^=i is a sequence of nonnegative random variables such that Xn -^d X for 
some finite random variable X and {/cnjj^i is a sequence of positive constants such that kn ^ 
then knXn — )-p 0. 

Proof. Let e > and (5 > be real numbers. Let M be a real number such that the function 
F{t) = P{X < t) is continuous at Me and P{X > Me) < 5/2. Such an M exists because the 
discontinuities of F are countable. Choose A^i so that if n > A'^i then /c„ < 1/M. Choose N2 so 
that if n > 7V2 then |P(X„ < Me) - P{X < Me)\ < 6/2. Then for n>NiVN2 

P{knXn > e) < P{Xn/M > e) 

< \P{Xn/M > e) - P{X/M > e)| + P{X/M > e) 
<6. 

n 

Lemma 4. Let {q„}^]^ and {/3n}$^i ^e sequences of positive numbers which converge to 0. Let 
{Xn}'^=i and {i^l^i be independent sequences of random variables and let X and Y be positive 
random variables such that anXn — t-^ X and jinYn -^d Y ■ If oin ^ /3n then P{Xn > Yn) — )• 1. 

Proof First note that P{Xn > Yn) = P{anXn > OnYn). Also, a„y„ = (a„//3„)/3„y„ and 
Oin/ l^n — )■ SO anYn -^p by Lemma El 

Let 5 > and choose e > such that the function F{t) = P{X < t) is continuous at e and 
P{X > e) > 1 - 6/2. We can choose Ni such that if n > iVi then P(a„X„ > e) > 1 - (5 by the 
definition of convergence in distribution. Choose A'^2 such that if n > A'^2 then P(a„y„ > e) < 6. 
Then for n>NiVN2 

P{anXn > anYn) > P({a„X„ > e} n {e > a„y„}) 
= P{anXn > e)P{e > anYn) 
>(1 - 6)' 

where 6 can be made arbitrarily small. D 



Lemma 5. Let {An}^^Q, {Bn}^=Q and {C„}^g be sequences of events such that lim„^oo -P(^n) = 
o > 0, lim„^oo P{Bn) = 1 and lim„_>oo P{Cn) = 0. Then 

lim P{Bn\An) = 1 and lim P(C„|^„) = 0. 

Proof. First note that 

lim P(A„ n C„) < lim P(C„) = 0. 

For n large enough P{An) is never 0, so lim„^oo -P(C„|^„) = lim„^oo P{Cn n A„)/P(^„) = 0. 
Likewise, lim„_>oo P{B^) = so lim„_>oo P{Bn\An) = 0. Therefore, the same reasoning yields 

liuin-^oo PiBn\An) = 1. D 

3 A Useful Model 

There is a similar model H2 which will be coupled with model Hi. This model is the same 
as Hi except in the way the daughter cells acquire type-2 mutations. Label the daughter cells 
Di, D2, . . . Djsi-i- In model H2 each daughter cell Di has a counter Cj starting at and is acted 
on by a sequence of Poisson processes {P^ri}n^=i which determine the type-2 mutations. All of the 
Poisson processes are independent. In this model, when a type-1 mutation occurs on a daughter 
cell Di it increases the counter Cj by 1. This is considered as a type-1 mutation. If a type-1 
mutation increases the counter to n, it is the n type-1 mutation on the cell. When the counter 
Ci has reached n, any type-2 mutations that would occur according to the Poisson processes 
Pl,P2, ■ ■ ■ Pn are accepted as type-2 mutations on cell Di. Any type-2 mutations that would 
occur according the the Poisson processes P^+i, Pn+2^ ■ ■ ■ ^^^ rejected. If a type-2 mutation 
occurs on cell Di as a result of the Poisson process P^, then the n type-1 mutation according to 
Ci is considered to be successful. If the first type-2 mutation on a cell is a result of the Poisson 
process P^, then the n^^ type-1 mutation according to Ci is the cancer causing type-1 mutation. 
However, a type-1 mutation on the stem cell does not have a counter. Once a type-1 mutation 
has spread from the stem cell to a daughter cell the daughter cell can no longer accumulate type-1 
mutations and the model is the same as model Hi. 

There is an extra convenience embedded in the model H2. We can consider the A^ — 1 
Poisson processes that mark the type-1 mutations on the individual daughter cells as one Poisson 
process which marks the mutations on the population of daughter cells whose measure is time 
independent. The rate of the Poisson process is vi{N — 1) and when a type-1 mutation occurs 
it occurs on any particular cell with probability 1/(A^ — 1). In the Hierarchical model. Hi, the 
mutations are suppressed on type-1 cells so that the rate at which the population of daughter 
cells is acquiring type-1 mutations depends on how many type-1 cells there are at the time. 

We couple Hi and H2 by allowing the same Poisson processes to mark the mutations on 
the cells within each model. The Poisson processes that mark the stem cells are the same. If 
a daughter cell has inherited a type-1 mutation from the stem cell then the Poisson processes 
marking type-2 mutations on the cell are the same in each model. The Poisson processes marking 
type-1 mutations on daughter cells are the same. The Poisson processes marking type-2 mutations 
on daughter cells in model Hi are the same as the Poisson processes Pf in model H2 so long 
as the daughter cells did not inherit their type-1 mutations from the stem cell. There are no 
analogous Poisson processes in model Hi for the A^ — 1 sequences of Poisson processes P2,P^, . . . 
in model H2. 



Lemma 6. Let the Poisson processes in models Hi and H2 be coupled as described above. Then 
P{t{Hi) = t{H2)), P{p{Hi) = p{H2)) and P{a{Hi) = a{H2)) all converge to 1. 

Proof. A type-2 mutation which occurs in model H2 but not in Hi is a result of the rejection 
of the type-1 mutation in model Hi that has led to the type-2 mutation in H2. This mutation 
is rejected because the cell on which the type-1 mutation was supposed to occur already was a 
type-1 cell. Any cell has at most logA^ ancestors, so the probability that a type-1 mutation is 
rejected is less than e~^i'°s^. Therefore, the probability of rejecting the type-1 mutation that 
causes the first type-2 mutation is converging to as long as vi <C 1/ log N . That is, if we number 
the cells 1, 2, . . . , A^ and let Ai be the event that the cancer causing mutation happens on cell i, 

N N 

P{r{Hi) / t{H2)) = Y,P{r{Hi) + t{H2)\A,)P{A,) < ^e'^^ '°s^P(^,) = 6"^^'°*^^ ^ 0. 

If we do not have vi <^ 1/log A^ then because V2 > vi we do not have V2 <C 1/log A^ either, 
with contradicts equation ([1]). Hence we only need to consider the case vi <^ 1/log A^. D 

The rest of the work in proving Theorem [1] is in proving Theorem [1] with H2 in place of Hi . 
Once this is done Theorem [1] follows from Lemma [6l 

4 The dd regime 

To understand the behavior in the dd regime, we consider a new model which is the same as H2 
except that mutations only occur on daughter cells. That is, there are no Poisson processes that 
cause mutations on the stem cells. This new model will be called model Mi. The purpose of this 
section is to prove Proposition [71 

Proposition 7. Let X be a random variable which has the exponential distribution with parameter 
1. Let Y be a random variable which has the Rayleigh distribution. 

1. If V1V2 <. l/N{logN)^ then (a A l)viV2N {log N)t {Mi) -^^ X. The distribution of a {Mi) 
converges to a uniform distribution on ((1 — a)~^ , 1] and p{Mi) converges in probability to 
1. 

2. Ifl/N{\ogN)'^ < t;ii;2 < 1/A^ then ^/NviV2t{Mi) ^d Y. Both a{Mi) and p{Mi) converge 
in probability to 1. 

3. If V1V2 3> 1/A^ then \/NviV2t{Mi) -^4 Y . Both a{Mi) and p{Mi) converge in probability 
to 1. 

Lemma 8. For any positive integer k < I we have P{p{Mi) > {I — k)/l) > 1 — 1/2 . 

Proof. Let Y be the number of generations between the cancer causing type-1 mutation and the 
first type-2 mutation. Then Y € {1, 2, . . . , /}. Because there are only / generations, if the second 
mutation occurs / — k generations or more after the first then it must be in the last k generations. 
So P{p{Mi) >{l- k)/l\Y £{l-k,l-k + l,...,l}) = 1. If we condition on the event that Y = k, 
then the probability that the cancer causing type-1 mutation occurs on any cell in generations 
1, 2, ...,/ — y is equally likely. This is because the descendants of the cells are independent and 
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identically distributed. The last k of the / — Y generations always make up at least a fraction of 
1 - 1/2'= cells, so we have P{p{Mi) > {I - k)/l\Y G {1, 2, ...,?- fc - 1}) > 1 - 1/2'= where we get 
a strict inequality because we do not count the stem cell. The result follows. D 

It is important to notice in the above lemma that we do not need A^ — )■ oo. We can see from 
the above lemma that P{p{Mi) > (/ — k)/l) > 1 — 1/2^= holds for any N so it remains valid as 

N ^ oo. 

Corollary 9. As N goes to infinity, p{Mi) will converge to 1 in probability. 
Lemma 10. Let (/3i,/32] C (0,1] and let C and C be a positive constants. Then 

Y^ 1^12^-1(1 - e-<=^^2(2'"^+'-f^')) ^ c{P2 - /3i V (1 - a))+viV2N\ogN. 
jeNn(«/3i,«/32] 

Proof. We will first define some notation for this proof for the sake of readability. Let / C M. 
We define 

I* :=/n(//3i,//32]nN. 

First we can do the case when a > 1. Let < e < 1 and break the sum into two parts, 

V1V22H V22H 

If we use the upper bound 1 - e-^^2(2'-'+i-C') < Cv2{2^-'+^ - C) < Cv22^^'^^ then 

V22H 
As for the second sum, the same upper bound yields 

^-^^"'"* ^J^ ^ < C(/32 - A V ey. 

From the second order Taylor expansion we get a lower bound of 

1 - e-c(2'-'+^-c") > Cv2i2'-'+' - C) - -C\l{2'-'^^' - C'f. 
We can show that this sum will go up to 1 — e by breaking the sum into 5 parts, 

2-1(1 - g-c.2(2'-+i-c')) > 2^cv2 - 2'CC'v2 - 2''-'C\l + 2'C^C'vl - 2'-^C\C'fvl 
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We get the following computations for each of the five individual sums: 
Y, 2^Cv2/v22H -^ C{p2 - /3i V e)+. 

^ 2'CC'v2/v22H < CC"2'+V(2'/) -^ 0. 

^ 2'~^C\C'fv^2/^22'l < 2C^C%2/1 -^ 0. 

i£{l€,l]* 

Y^ 2^C'^C'vl/v22H < C^C'v2 -^ 0. 

i&{le,l]* 

I 

Y 2^'~'C^vl/v22H = C^V22\ Y 2"*)A < C^V22'^'-'^ ^ 

so long as V2 <C 1/2 '^^"^^ = N^^^'^ which will hold since this is the case a > 1. 
So we have 

C(/32 - Pi V e)^ < hmmf ^ '- -r- 

Wlf22 / 

and because (/32 — /?i V e)"*" + (/?2 A e — /^i)"*" = /?2 — /?! we also have 

hmsup -J- < C(/32 - /3i). 

Since e may be made arbitrarily small we have finished the case for a > 1. 

Now let < Q < 1 and let e > be small enough so that 0<1 — a — e<l — Q + e<l. We 
now break the sum into three pieces, 

(I]ie[l,/{l-Q-e))* +I^Je[/{l-Q-e),«(l-a+e)]* +I^ie(/{l-Q+e),«]*)2* (1 " e~ "^^ ~ ') 

We can consider each of these three sums individually. 
As for the middle sum, we only need the bound 

V22H 

which follows by the upper bound 1 — e~*^''2{2 ' -C") ^ (7^,22'^*+^. 

One can apply similar computations as in the case when a = 1 to obtain the following: 

f22'i 
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For the first sum, note that 1 — e *-^^2(2 '^ > < I. This gives the bound 



i;22'/ 



^^ 2^-1(1 _e-C.2(2'-'+i-C')) 

ie[l,«(l-a-e))* 

oi— 1 



ie[l,«(l-a~e))* 
9«(l-«-e) 

<^ — ^^0. 
- t;22'Z 

The convergence is a result of the definition of a, namely that V2 ^ iV^"^*^ log^ A^. 
Combining the three sums yields 

C(/32 - A V (1 - g + 6))+ < hminf ^-^(^ft-'fel' "^' ^' ^ 



/fit;22' 
and 

limsup ^ ■ ; -, < C(/^2 - Pi V (1 -a + e))^ + 26e. 

IV\V22} 

Again, e may be arbitrarily small which gives the result. D 

Corollary 11. For any time t, the rate at which successful type-1 mutations occur is asymptotic 
to {a A l)i;ii;2iVlog A^. 

Proof. For 1 < i < I there are 2*~^ cells in generation i. Each of these cells is getting type-1 
mutations at rate vi. The cells in generation i have 2 "'"^ — 2 descendants. If the cell splits as 
soon as it becomes a type-1, the probability that none of its descendants get a type-2 mutation 
is e"''^'^ ' ~'^K On the other hand, after a cell gets a type-1 mutation it could live for at most 1 
time unit until it splits. If this is the case, then the probability that neither the cell that receives 
the type-1 mutation nor any of its descendants get a type-2 mutation is e"''^'^ ~^' . If we let 
R{t) be the rate at which the successful type-1 mutations occur at time t, then for any time t we 
have 

1 = lim ^'-^ , ^ ^<liminf. ^^ 



{a A l)viV2N log N ~ (a A l)viV2N log N 

< limsup ^ — — < lim '^'-^ , ^ '- = 1, 

(a A l)t;it;2A^logiV ~ {a M)viV2N log N 

where the limits are results of Lemma [TOl D 

Lemma 12. If viV2 ^ l/A^(logA^)^ then the distribution of a (Mi) converges to the uniform 
distribution on ((1 — a)^, 1]. 

Proof. Let Xi be the time at which the cancer causing mutation occurs and let Yi be the time at 
which the first successful type-1 mutation occurs. By Corollary II II we have that the random vari- 
able (a A 1)^1^2 A^(log N)Yi is converging in distribution to an exponentially distributed random 
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variable with parameter 1. Let Y2 be the time it takes to get the second successful type-1 mutation 
after the first and let X2 = t(M2) — Yi. As a result of Corollary [TT] again, (a A l)7;if2A^(log N)Y2 
converges in distribution to an exponentially distributed random variable with parameter 1. Then 
because a type-2 mutation must occur within log A^ time after a successful type-1 mutation on a 
daughter cell we have 

P{Y2 < X2) < P{Y2 < logN) = P{{a A l)viV2N {log N)Y2 < (a A l)t;it;2iV(log A^)^) -^ 0. 

Moreover, P{Y2 > X2) < P(Yi = Xi) so P{Yi = Xi) -^ 1. Therefore, it is enough to find the 
distribution of the first successful type-1 mutation. 

Each generation i with 1 < i < / is getting successful type-1 mutations independently at a 
rate bounded between fi2*~^(l — 6""^*^^ * ~^)) and ?;i2*~-'^(l — e~'"^^'^ ' ~^^) for any time t. 
Therefore, for a fixed A^ and i, the probability that the first successful type-1 mutation occurs 
on generation i is between 



and 



T;i2^-ni - e-^2(2'-+i-i)) 



Let /5 G [0, 1]. Using the notation and result from Lemma [TU[ 



limsupP(cj(Mi) < /3) < limsup 



a 



ie(o,z] 
and 

-,1-i + l 

> .v-^n ml* v^^Z' Mi - e "-"^^ 
liminf P(o-(Mi) < /3) > liminf 



E^ei0M* ^i^^-^(l - e--(^'-'"^-^)) _ (/? - (1 - «)+)+ 



D 



Lemma 13. If V1V2 <. l/N{logN)'^ then (a A l)fiW2A^(log A^)t(Mi) -^a X where X is an 
exponential random variable with parameter 1. 

Proof. Let Xi be the time at which the cancer causing type-1 mutation occurs and let X2 = 
r(Mi) — Xi. From the proof of Lemma [12] we know that the probability that the first successful 
type-1 mutation is the cancer causing mutation is converging to 1. By Corollarv 1111 we know 
that the rate of successful type-1 mutations is approaching (a A l)i'if2A^log A^. This gives us 
that (a A l)fif2A^(log A^)Xi is converging in distribution to an exponentially distributed random 
variable with parameter 1. 

Let e > 0. Due to apoptosis X2 is bounded above by log A^ so 

P((a A l)uiU2Af(log A^)X2 > e) ^ 0. 

In other words, (a A l)wit'2A^(log A^)X2 — T-p 0. Then 

(a A l)uii;2Af (log N)t{Mi) = (a A l)f if2A^(log A^)(Ai + X2) -^d X. 

n 
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Combining the results of Corollary [9] and Propositions 1121 and 1131 we have part 1 of Proposition 
[71 For the next two proofs we note that Corollary [9] already gives us that p{Mi) converges to 1 
in probability. 

Proof of part 2 of Proposition^ Consider generation i for some i S {1,...,?} at time 0. The 
total number of descendants of the cells in generation i is 2 ^*^^ — 2. However, if t < / — i then 
the total number of descendants by time t is between 2*^^ and 2*"^^. At each integral time unit 
there is a new collection of cells in generation i. We can consider a sequence of collections of 
cells where the first element in the sequence is the collection of cells in generation i during time 
[0, 1), the second element is the collection of cells in generation i during time [1,2), and so on. 
Because the Poisson processes marking the type-1 mutations in model H2 are independent of 
the type-1 mutations that have already occurred we can consider the sequence of cells in these 
generations and their descendants that occur over time to be independent. Also, the random 
variables denoting the times at which type-2 mutations occur as a result of type-1 mutations on 
the cells in generation i would be identically distributed if we were to start each new collection 
of cells in generation i at time 0. If t < / — i then by time t the number of cells which will have 
descended from the j element in the sequence will be between 2*^^^-' and 2*^^^-' for j < [t\. 
If we sum over all of the terms in the sequence which have appeared by time t, the total number 
of cells which have descended from a cell in generation i (including those which have already 
undergone apoptosis) will be between 



^2*-i-J > 2*-l 

j=0 
and 

j=0 

If t > I — i then by time t the total number of cells which will have descended from a cell in 
generation i will be between 

i-i 

'"^-^■-1 + {t-l + i)(2'-^+i - 2) = 2'-^ - 1 + (t - / + i)(2'-*+i - 2) 



j=0 



and 



Y, 2'-'-^'+i + {t-l + i){2^-'+^ - 2) = 2'-*+2 _ 1 + (^ _ ; + i)(2'-*+i _ 2). 
j=0 

Recall that there are always 2*~^ cells in generation i which are acquiring type-1 mutations at 
rate vi. If we multiply the rate of type-1 mutations on generation i by the probability that such a 
mutation is successful, we find that the type-2 mutations that occur as a result of successful type- 
1 mutations that occur on generation i occur according to a Poisson process that has intensity 
measure between 

2-l^;l(l - e-2(2*-i)) ^^d 2^-ir;i(l - 6-^^(2*+^-^)) 

ii t < I — i and 
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ift>l-i. 

First we concentrate on the upper bound. For N large enough we will have t < \/viV2NlogN 
for any real number t by the hypothesis l/A^(log A^)^ <^ wiW2- Let t/^/v\V2N < I. Then 

P{r{M,) < -^±=) = 1 - e-/W*) 
where by summing over the generations and using the fact that 1 — e~^ < x we obtain 
f{N,t)< Yl 2-V(l-e-^^(2*'^''^^'-i)) 

0<i<l~ , ' ^^ 

-V2i2'-^+^-l+(t/VviV2N~l+i)(2'-^+^-2))\ 



+ Y. 2-s(i- 

I ^-^<i<l 

< Yl 2'-\2'/^^^^^^+^ - l)viV2 



0<i<l- 



y/vyu^ 



V1V2N — — 

As for the first sum, 

< 2'+2t)it;2 ^ 0. 



As for the second sum, we first compute 

y 2^-1(2^-^+2 _ X)v^^^ < 2'+\iV2^^= 

Lastly, 



I- , * <i<l \v 1 z / ^ t ^-^i \V 1 z 

•y'iiXD2iV y'viV2N 

2'f it;2 / * ^ ^ 



< — -^ +1 

t2 



2 
Therefore, limsupP(v'wruiiVr(M2) < t) < 1 - e-*'/^. 



16 



As for the lower bound, we have 



fiN,t) > Yl 2-it;i(l - e-2(2'/^^^-i)^ 



0<i<l r^^= 

+ Yl 2^-^i(l-e 



> J2 2^-i?;i(l 



V2{2'-*~l+{t/^viV2N-l+i){2'-i+^-2))-, 



^-V2{t/VviV2N-l+i){2'^-^+^-2)'. 



--<i<l 



Using the bound 1 — e~^ > x — x^/2 we have 

V 2*^^Wi(l - e-^2(i/vV^^-W)(2'-'+i-2)^ 

l-tl^VYV2N<i<l 



will be greater than or equal to the sum over i G [/ — t/\JviV2N ^ I] of 



^■"- I - f 7JSW - ' + ') P"" - ^' - "^' (s -' + ')' P""' - ^)^/^) ■ 



First consider 



^ ^"-n^^-' + ' 



t \2 (2'-i+l_2)2 



This sum is bounded between and Yl,i-t/J viV2N <i<l 'W2i^2'^*. Let < e < a. For N large enough 
we have t < \/viV2Nl{a — e) which is equivalent to 1(1 — a — e) < I — t/\/viV2N. So for N large 
enough we have 

J2 V2t^2^-' < Yl ^2*^2'-* < IV2N''~' -^ 0. 

l~t/^/viV2N<i<l l-l{l-a+e)<i<l 

This leaves us to show 

hminf Y 2-it;i^2 (^^ - I + i) (2^-^+' " 2) > |. 

, , \^/VlV2N J 2 

l-t/yJviV2N<i<l 

Let J G N and t > 0. By our assumptions, for large enough values of N we will have j < 
t/^/viV2N < logN. Notice that if i < / - j then 2'"^+^ - 2 > (1 - 2-^)2'-^+^ so 

l-t/VviV2N<i<l 



t 

yJviV2N 



> Y 2-V^2(^7^^===-/ + z)(l-2-^)2'-+i 

l~t/y/viV2N<i<l-j 
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E 2--V.J:^=L=-i + 0(i-2-)2-« 



Because j is fixed we have 

since each of the summands converges to 0. Therefore, we can add this sum without changing 
the hmit. This gets us a lower bound of 

hminf y 2'uiU2 I ^^^-/ + i ) (1-2"J) > -(1-2--'). 

l-t/yJviV2N<i<l 

We chose j to be any natural number, so \mlm.iP{^JvlV2NT{M2) <t)>\ — e~* '^. 

The above two bounds estabhsh that P{y/vYVQNT{M2) <t) ^l- e'^' 1'^ for any t > 0. This 
leaves us to show that a (Mi) converges in probability to 1. First note that for any e > we have 

P(r(Mi) < elogiV) = P{y/NviV2TiMi) < ^/NvlV2elogN) -^ 1 

which follows because the distribution of ^/NvlV2T{Ml) is converging to the Rayleigh distribution 
and \/NviV2elogN is converging to 0. Let 6 > 0. By Corollary [9] we know that p{Mi) converges 
in probability to 1 so that as N goes to infinity, P{p{Mi) > 1 - 6) -?- 1. If a{Mi) < 1 - 25 
and p{Mi) > 1 - 6 then r(Mi) > 6logN. Because P(t(Mi) > 6logN) -^ we must also have 
P(<t(Mi) < 1 - 25) -> where 5 > was arbitrary. Then P(l - cr(Mi) > 26) ^ for any 5 > 
so (t(Mi) -^p 1. D 

Proof of part 3 of Proposition^ We shall make use of the following well known fact: If {a„}5^i 
is a sequence of real numbers such that a„ — )• o, then 

lim (1 - ^)""i = e\ 

n— >oo Tl 

Before time 1 the cells never split and there is no apoptosis. If we ignore the splitting and 
apoptosis and consider how long it takes for a cell to acquire two mutations under the mutation 
mechanism alone then we have N — 1 cells acquiring mutations independently. For any individual 
cell, the time it takes to acquire two mutations will have the same distribution as the sum of two 
independent exponentially distributed random variables with parameters vi and V2- If we denote 
the time until cell i has a type-2 mutation by Tj and assume vi ^ V2 then 



pm <t) = i 



V2e ^1* — Vie ''2* 



V2 - Vi 

There are A^ — 1 cells independently getting mutations, so for t < 1 we have 



P(r(Mi) <t) = l 



V2e-''^^ - vie-"^^^ ^~^ 



V2 - Vl 

or equivalently, 

N-l 



\/viJv2Nt _ - yJv2/viNt ' 



P(7^WVr(Mi) < t) = 1 - — ''- 

\ f2 - 1^1 
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By using the third degree Taylor expansion of the exponential function we get the bounds 







Notice that Ny^vfjv^ = vl/y/vrviN -> and iVy^u^/uiiVS = vl/^/viV2N -^ 0. Then for 
any fixed t we have 

I \ Af-l 



2Af Y V2N^ 6 
and 



Af-l 



2N^\I viN^ 6 

If fi = ^2 then the probability that one cell has two mutations by time i is 1 — e~^^* — 
vite~'"'^^ if we ignore splitting and apoptosis. The probability that one of the N cells has two 
mutations by time t is 1 — {e~'"^^ — vite~'"^^) . By applying the same techniques as above we get 
PyviV2NT{Mi) <t)^l- e-*'/2 when vi = V2. 

Combining the two results above we have P{\/viV2Nt{Mi) < t) — )• 1 — e^* '^ when ignoring 
splitting and apoptosis. Then P(r(Mi) < 1) = P{\JviV2Nt[Mi) < \/viV2N) — )■ 1. Therefore, 
the probability that two mutations occur before time 1 is converging to 1 so we may ignore 
splitting and apoptosis in this case. This gives the desired result for r(Mi). 

By Corollary [9] we know that p{Mi) converges in probability to 1. Because the two mutations 
occur before splitting or apoptosis, the probability that the cancer causing type-1 mutation and 
the first type-2 mutation are on the same cell converges to 1. Therefore, cr{Hi) converges to 1 in 
probability. D 

5 The sd and ss regimes 

In this section we need two different models. The first one is the same as model H2 except that 
only the stem cell receives type-1 mutations and only the daughter cells receive type-2 mutations. 
The second is the same as H2 except that only the stem cell receives mutations. These will be 
referred to as models M2 and M3 respectively. 

Proposition 14. Let X be a random variable which has an exponential distribution with param- 
eter 1. 

1. If ui <^ 1/log A^ and ui <^ Nv2 then uit{M2) -^d ^ (^^d p{M2) converges in probability to 

a A 1. 

2. If ui < U2 then uit{Mz) —^^ X. 

3. Let A>1 and Z be an exponentially distributed random variable with parameter \/A which 
is independent of X. If ui ~ Au2 then uit{Ms) -^^ X + Z. 
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The goal of this section is to prove Lemma [HI It will be shown later that the conditions used 
in Lemma O for the sd regime are the only relevant conditions. 

Define Xi to be the time at which the cancer causing type-1 mutation occurs and define X2 
to be the time after the cancer causing type-1 mutation until the first type-2 mutation. Note 
that because the stem cell is the only cell that gets type-1 mutations in models M2 and M3 that 
the first successful type-1 mutation is also the cancer causing type-1 mutation. 

Lemma 15. Consider the model M2. For time t < logA^ after the stem cell receives a type-1 
mutation we have 

P{X2 >t)> e-2*+'^2 and P{X2 > t) < e-(2*-'-2)«2_ 

Proof. First we establish the upper bound. After the stem cell gets the first mutation it takes 
at most one time unit until the mutation is passed along to the first generation daughter cell. 
Assuming it does take one time unit until the first generation daughter cell inherits the mutation 
we can get an upper bound on P{X2 > t). Let time t = denote the time at which the stem cell 
receives the type-1 mutation. There are no mutations being acquired by the daughter cells for time 
t G [0,1). For time t £ [1,2) the generation 1 daughter cell is the only type-1 daughter cell. So 
for t G [1, 2) we have P{X2 > t) = e~^''~^''"^ . For time t G [2, 3) the first two generations have the 
mutation which is a total of 3 cells. Therefore, for t £ [2,3) we have P{X2 > t) = e-(3(t-2)t'2+i'2) 
where the V2 is added because of the probability of having a mutation before time 2. Extending 
this inductively gives us 

P{X2 >t)< e-[(2L*^-i)(t-W)+Eli^2{2'-i-i)]';2 < g(-2*-2-i)^2 

for any t < log A'^. 

For the lower bound we use the same reasoning as above except that we assume it takes 
time for the generation 1 daughter cell to become a type-1 after the stem cell is a type-1. This 
gets us 

P{X2 >t)> e-[(2r*^-i)(t-LtJ)+Eli{2'-i)]^2 > e-2*+'^2_ 

n 

Lemma 16. The location of the second mutation satisfies p{M2) -^p a Al. 

Proof. By Lemma [15] we have P{X2 > logN) > e~^"'"2. If a > 1 then P{X2 > log A^) converges 
to 1 and the mutation will spread throughout the entire crypt. If this is the case then any cell is 
equally likely to have the second mutation. Therefore P{p{M2) < P) < (2^' — l)/(2' — 1) for any 
/3 e [0, 1) so p{M2) ^p 1. 

Now suppose a < 1. Let e > so that a — e > 0. Then by Lemma [TSl 

P{X2>l{a-e))>e-^'"'-''^'^\ 

Because AN"-~''V2 — )■ we get the convergence P{X2 > l{a — e)) — )■ 1. By time l{a — e) the 
mutation will have spread to the first \_l{a — e)J generations so that for times after /(a — e) we 
know that at least 2Lh"~'^)J cells have the type-1 mutation. Therefore, 

P{{p{M2) < /3} n {X2 > l{a - e)}) < {2^' - l)/(2("-^)'-i - 1). 
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Thus, for any (3 < a — e, 

9/3/ _i 

PipiM2) < /3) < ^(^_^),_, _ ^ + PiX2 < l{a -e))^0 

Hence P{p{M2) > a — e) — t- 1. Because e may be arbitrarily small we have finished the case when 
a = 1. 

Suppose a < 1 and let e > so that a + e < 1. Then by Lemma [T^ 



P{X2 > l{a + e)) < e 



.(2i{<^+E)-2_l), 



■"2 



Because N^~^''V2/^ — )• oo, we have P{X2 > l{a + e)) — )• 0. By time /(a + e) the mutation has only 
spread to the first l{a + e) generations, so P{p{M2) > a + e) — t- where e is arbitrarily small. D 

Lemma 17. If ui -^ 1/logA^ and ui ^ Nv2 then uit{M2) -^d X where X has exponential 
distribution with parameter 1. 

Proof. Since the stem cell is getting mutations according to a Poisson process at rate ui we have 
that uiXi is an exponentially distributed random variable with parameter 1. This leaves us to 
show U1X2 — )-p 0. 

Suppose we consider a new model M2 which is the same as model M2 except that the type-2 
mutations can only occur on daughter cells log N time after the stem cell has a type-1 mutation. 
We can couple models M2 and M2 so that the same Poisson processes are marking the mutations 
on the daughter cells in each model but that any proposed type-2 mutation is rejected in model 
M2 until log A^ time after the stem cell mutation. This way Xi is the same in models M2 and Mg. 
Also, if we let X'2 = t^M!^) — Xi then X'2 > X2. Therefore it is enough to show that U1X2 — >-p 0. 

If we wait log A^ time after the stem cell receives its type-2 mutation then all of the daughter 
cells will be type-1. Then the (A^ — 1) daughter cells are getting type-2 mutations at rate V2- 
Thus for any fixed A'^ we have 

PiX'2 >t) = l[o,iog;v](t) + e-^^^^''^^*-'°^'^^l(iogiv,oo](i). 
Let e > 0. Then 

P{uiX2 > e) = l[o,iogW] ( ~ ) + ^ 

By our assumptions, ui log A^ — )• so for A^ large enough this becomes 

P{uiX'2 > e) = e-^2(Af~i)(./«i-iog7V)^ 

Also by our assumptions, —V2{N — l){e/ui — log A^) ~ —V2Ne/ui — ;■ —00, so 

P{uiX'2 > e) ^ 0. 

n 



V2{N-l){e/ui-\ogN)^ I e 



(logAf,oo, , 
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Proof of Proposition \14\ Combining Lemmas [16] and [T7] we get part 1 of Proposition HM 



As in model M2, wi^i has the exponential distribution with parameter 1. To prove part 2 of 
Proposition [m we need to show that U1X2 — s-p 0. Let e > 0. Then 



P{uiX2 > e) = P{X2 > e/ui) = e 



-eU2/ui 



Since U2/U1 — >■ 00 we have P{uiX > e) — )• 0. 

Lastly we prove part 3 of Proposition [TH In model M3 both mutations occur on the stem 
cell. In this case uiXi and U2X2 are both exponentially distributed with parameter 1. Because 
U1X2 = {ui/u2)u2X2 we have that U1X2 is exponentially distributed with parameter U2/U1. By 
assumption, U2/U1 — )• 1/A so U1X2 converges in distribution to Z. The random variables Xi and 
X2 are independent for each A^ so 

nir(M3) = uiXi + U1X2 ^d X + Z. 

n 

6 Proof of the Theorem 

We will couple the models H2, Mi, M2 and M3 so that the Poisson processes used in models 
Ml, M2 and M3 are the appropriate subcollections of Poisson processes which are used in model 
H2. Let T be the time that the stem cell becomes a type-1. Note, because the stem cell cannot 
inherit a type-1 mutation and H2, M2 and M3 are coupled, that T will be the same for models 
H2, M2 and M3. 

Let X be exponentially distributed with parameter 1 and let y be a random variable with 
the Rayleigh distribution. 

Lemma 18. Suppose i;iW2 <. l/N(\ogN)'^. If ui < viV2N\ogN then P(r(Mi) < T) ^ 1. // 
ui > viV2NlogN then P(r(M3) < r(Afi)) -^ 1. 

Proof. By part 1 of Proposition [7] {a A l)viV2N {log N)t {Mi) -^^ X- Mutations to the stem 
cell occur at rate ui so uiT -^d X. Because the mutations Poisson processes which mark the 
mutations in model Mi are independent of the Poisson process that marks the mutations on the 
stem cell, if ui <^ V1V2N log N then P{t{Mi) < T) — )• 1 by Lemma HI 

On the other hand, suppose ui » V1V2N log N . We are assuming ui < U2 so we could decrease 
P(r(M3) < r(Afi)) by decreasing U2 to ui. Then the distribution of uit{M3) is the distribution 
of the sum of two independent exponentially distributed random variables, P(uir(M3) < t) > 
1 - e"* - te'K By Lemmad P(r(M3) < r(Mi)) ^1. D 

Lemma 19. Suppose l/N{logNf «; W1W2 < l/N. If m « y/viV2N then P{t{Mi) < T) -> 1. 
Ifui > ^/vlV2N then P{t{M3) < t(Mi)) -^ 1. 



Proof. Let ui <^ \/viV2N. By part 2 of Proposition [7] we have ^/vlV2NT{Ml) — )-rf Y. The stem 
cell is getting mutations at rate ui so uiT — )• X. The Poisson processes that are marking the 
mutations in model Mi are independent of the Poisson process that marks mutations on the stem 
cell, so the result follows by Lemma HI 

If ui ^ V1V2N log A^ then the proof follows by the same reasoning as used in Lemma [TSl when 
considering ui ^ t;it;2A^log A^. D 
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Lemma 20. If viV2 > 1/N then P(t(Mi) <T)^1. 

Proof. By part 3 of Lemma [7] we have \/viV2Nt{Mi) -^d Y . The stem cell is getting mutations 
at rate ui so uiT — )• X. The Poisson processes that are marking the mutations in model Mi are 
independent of the Poisson process that marks mutations on the stem cell, so the result follows 
by Lemma [H since ui ^ 1 ^ \JviV2N . D 

Lemma 21. If U2 <. 1/log A^ and U2 <. Nv2 then P{t{M2) < t(M3)) -^ 1. 

Proof. Because models M2 and M3 are coupled, the stem cell in each model will receive a type-1 
mutation at the same time. After this the Poisson processes marking the mutations in models 
M2 and M3 are independent. Let T2 be the time it takes for a type-2 mutation to occur in model 
M2 after the stem cell has a type-1 mutation and let T3 be the time it takes for a type-2 mutation 
to occur in model M3 after the stem cell has a type-1 mutation. Then P{t{M2) < t(M3)) = 

P{T2 < T3). 

Consider again the model Mg that was introduced in the proof of Lemma [T7] which is the 
same as model M2 except that the type-2 mutations can only occur on daughter cells log N time 
after the stem cell has a type-1 mutation. We can couple models M2 and Mg as we did before so 
that the time of the stem cell mutation is the same in models M2 and Mg. Let Tg be the time it 
takes to acquire a type-2 mutation in model M2 after the stem cell has a type-1 mutation. Then 
T2 > T2 so it is enough to show that P{T2 < Ts) — )• 1. 

If we wait log A^ time after the stem cell receives its type-2 mutation then all of the daughter 
cells will be type-1. Then the (A^ — 1) daughter cells are getting type-2 mutations at rate V2. 
Thus for any fixed A^ we have 

P{T^ >t) = l[o,iog^](t) + e-^^(^-l)(*-'°sA^)l(l„g;v,oo](t)■ 
Let e > 0. Then 

p{T^ < T3) = p{T^ < n\n < log N)p{n < log a^) + p^ < n\n > log N)p{n > logN). 

Because U2 <^ 1/logA^ and U2T3 has the exponential distribution with parameter 1, P{T3 > 
log A^) — )• 1. The memoryless property of the exponential distribution gives us that 

p(r^<r3|r3>iogAr)- ^^(iv-i) 



V2{N -1)+U2 

which completes the proof. D 

Lemma 22. If U2 > 1/log A^ or U2 > Nv2 then P{t{M3) < r(M2)) -^ 1. 

Proof. Because models M2 and M3 are coupled, the stem cell in each model will receive a type-1 
mutation at the same time. After this the Poisson processes marking the mutations in models 
M2 and M3 are independent. Let T2 be the time it takes for a type-2 mutation to occur in model 
M2 after the stem cell has a type-1 mutation and let T^ be the time it takes for a type-2 mutation 
to occur in model M3 after the stem cell has a type-1 mutation. Then P(r(M3) < t(M2)) = 
P{T3 < T2). 

Suppose U2 ^ 1/log A^. By Lemma [THl we know that p{M2) — )-p a A 1. Therefore, if < 5 < 
(a A 1) then P[p[M2) > (a A 1) — 5) — )■ 1. If p{M2) > (aAl) — S then the second mutation occurs 
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on a generation higher than ((a A 1) — 5)1. Since the stem cell is the only cell that gets type-1 
mutations in model M2, this means that T2 > [((a A 1) — 6)l\ because it takes that much time for 
the type-1 mutation to spread to the generation ((a A 1) — 6)1 daughter cells. On the other hand, 
in model M3 the second mutation is occurring at rate U2 so that ^2^3 is exponentially distributed 
with parameter 1. Then P{T3 < KlogN) = P{u2T^ < U2K\ogN) — )• 1 for any positive number 
K since U2 log A^ — )• 00. Therefore PiT^ < T2) — )• 1. 

Suppose U2 ^ Nv2. The rate at which type-2 mutations occur in model M2 is always 
bounded by (A^ — 1)^2- Suppose we consider a new model Mlj which is the same as M2 except 
that once the stem cell has a type-1 mutation, all of the daughter cells also have a type-1 mutation 
instantaneously. This can be coupled so that after the stem cell gets a type-1 mutation then any 
type-2 mutation proposed by a Poisson process on a daughter cell is accepted in model M!^. 
Then if we let Tt^ be the time it takes for a type-2 mutation to occur in model Mlj after the stem 
cell has a type-1 mutation, {N — l)v2T2 has the exponential distribution with parameter 1. By 
Lemma H P(r3 < Ta') -> 1. Because T2 > T^ we have the desired result. D 

Proof of Proposition [21 From the coupling we have t{H2) = t(Mi) A r(M2) A t{M^) because any 
type-2 mutation which occurs in model H2 must occur in at least one of the models Mi for some 
z, and if a mutation occurs in model Mj then it will also occur in model H2. 

Suppose P{t{Mi) < T) — )• 1. Before time T models models M2 and M3 are only acquiring 
mutation on the stem cell. Therefore, models M2 and M3 only have type-0 cells before time T 
and P(r(Mi) < t(M2) A t{M^)) -)■ 1. 

• By Lemma [H] if W1W2 < l/A^(log A^)^ and ui < i;if2A^log A^ then P(r(Mi) < T) -;> 1 so by 
part 1 of PropositionOand the coupling of H2 with Mi we have {aAl)viV2N{log N)t{H2) -^d 
X. The distribution of a{H2) converges to a uniform distribution on ((1 — a)+, 1] and p{H2) 
converges in distribution to 1. 

• By Lemma [19] if l/A^(log A^)^ < ^1^2 < '^/N and ui < ^/vyvqN then P(r(Mi) < T) ^ 1 
so by part 2 of Proposition [7] and the coupling of H2 with M2 we have \^viV2Nt{H2) -^d Y ■ 
Both a{H2) and p{H2) converge in distribution to 1. 

• By Lemma [20l if t'it'2 ^ 1/-^ then P(r(Mi) < T) — )• 1 so by part 3 of Proposition [7] and 
the coupling of H2 with Mi we have \nVviV2T{H2) -^d Y ■ Both (j{H2) and p{H2) converge 
in distribution to 1. 

If either z;iU2 < l/A^(logA^)2 and ui > viV2N\ogN or l/A^(logA^)^ < fif2 < 1/A^ and 
ui ^> \/viV2N then P(r(M3) < r(Mi)) — )• 1 by Lemmas [18] and [19] respectively. Therefore, 
P[t{M2) a t{M^) < t(Mi)) —7- 1 (meaning that the cancer causing type-1 mutation occurs on 
the stem cell). Given these four conditions, we are left only to compare r(M2) and r(M3). 

• By Lemma [21] if U2 < 1/logA^ and U2 < Nv2 then P(r(M2) < r(M3)) -^ 1. Because 
ui < U2 the hypotheses are true for ni as well. Therefore, by the coupling of H2 with M2 
and part 1 of Proposition [H] we have uit[H2) — )-d X. The distribution of p{H2) converges 
to a A 1. 

• By Lemma [22] if U2 > 1/logA^ or U2 » Nv2 then P{t{M^) < r(M2)) -^ 1. If ui < M2 
then by the coupling of H2 with M3 and part 2 of Proposition [14] we have uit{H2) -^d X. 

24 



If ui ~ Au2 then by the couphng of H2 with M3 and part 3 of Proposition [T3] we have 
u\t{H2) -^d X + Z where Z is an exponentially distributed random variable with parameter 
1/A that is independent of X. 

By Lemma [6] the results hold for model Hi as well. D 

7 The Null Model 

For this section we always have ui = U2 = vi = V2 = ^ and we prove Proposition [5] for model 
H2- Then Proposition [2] will hold for model Hi as well by Lemma [H We begin this section 
by pointing out that the conditions of part 5 of Theorem [1] always fail in the null model. The 
two conditions in the first conjunction become ;U <C 1/A^logA^. Of the two conditions in the 
second conjunction, one becomes \/]V ^ 1 which always fails. This reduces all of the conditions 
in the first bullet point to ;U <C l/N\ogN. The conditions in the second bullet point become 
/i ^ 1/log A^ or 1 S> A^, so the conditions in part 5 can only hold if 1/log A^ ^ /i ^ 1/A^logA^ 
which can never happen. 

This shows that the probability that the first type-2 mutation occurs on the stem cell converges 
to 0. For this reason, we will never consider model M3 in this section. 

Proof of part 2 of Proposition 0. This time we first consider independent models Mi and M2 
meaning we do not couple the Poisson processes that mark mutations on the cells within each 
model. We construct a new model from models Mi and M2 which we will refer to as model M^2- 
The Poisson processes of model M2 always mark the cells in model M^2- The Poisson processes 
that mark the mutations in model Mi mark the daughter cells in model M^2 until the stem cell 
has a type-1 mutation. After the stem cell has a type-1 mutation, the Poisson processes in model 
Ml no longer mark any of the cells in model M^2- This way model M^2 behaves exactly like 
model M2 after the stem cell has a type-1 mutation. 

Model Mj~2 is the same as model H2 except that the stem cell cannot get type-2 mutations 
and for log N time after the stem cell receives a mutation the type-2 mutations are suppressed on 
daughter cells that have not inherited the type-1 mutation from the stem cell. Let T be the time 
at which the stem cell has a mutation and let T2 = t{M2) — T. By the same argument used in 
Lemma [17] to show U1X2 -^p we know /XT2 — )-p 0. Also, from part 1 of Proposition [7| we know 
P{pLT{Mi) >t) -^ e-^K Let e > 0. Then 

limsupP(/ir(i:f2) > t) < limsupP(;Ur(Mj^2) > t) 

= limsup {P{{ht{Mi) >t}n {fiT > t}) + P({t(Mi) > T} n {fiT < t} n {/ir(M2) > t})) 

< lim sup P({/ir (Ml) > t})P({/xr > t}) + limsup P^^uT < t} n {/ir(M2) > t}) 

= e-(^+^)* + lim sup P{{fiT <t}n {fi{T + T2)> t}) 

< g-(l+A)t 

+ limsup(P({;ur <t<fi{T + e)} n {1J.T2 < e}) + P{{fiT <t<fi{T + T2)} n {fiT2 > e})) 

< e"(^+^)* + limsupP(/ir e [t - e/x, t]) + limsupP(;ur2 > e) 

= e-(i+A)t 
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where the third hne follows by the independence of t(Mi) and T and the last line follows because 

/iT is exponentially distributed, /x — )• and HT2 -^p 0. Hence we have limsupP(^T(//2) > t) < 

g-(l+A)t_ 

We define another model, M^2j which is the same as model Mj~2 except that we always count 
the mutations from model Mi. That is, we have models Mi and M2 and we are looking for 
the first mutation that occurs on either of these models. We couple model M^2 with model 
H2 so that before the stem cell has a mutation the Poisson processes marking the mutations in 
models Mi, M2 and H2 are the same and after the stem cell has a mutation the Poisson processes 
marking the mutations in model M2 only mark those generations in model H2 which have not 
yet inherited a type-1 mutation from the stem cell. If we wait log A^ time after the stem cell has 
a mutation then the Poisson processes marking model Mi are not marking model H2. Then 

liminf P(/UT(i/2) > t) > liminf P(^M+2 > t) 

= liminf P({^t(Mi) > t} D {^r(Af2) > t}) 
= liminf P(/ur(Mi) > t)P(/ir(M2) > t) 

where the last equality follows by part 1 of Proposition [71 Combining this with the above result 
we have lira P{fiT{H2) < t) = 1 - e-(^+^)*. 

Note that limsupP(T = t(Mi)) = 0. Let e > 0. By continuity of measure there exists 6 > 
such that limsupP(0 < r(Mi) -T <5) <e. Then 

limsupP(T(Mi) <t(M2)) 

= limsup(P({r(Mi) < t(M2)} n {t(Mi) < T}) + P({t(Mi) < r(M2)} n {r(Mi) > T})) 

< limsupP(T(Mi) < T) + lim sup P(T < r(Mi) <T + T2) 

< - + limsupP(r<r(Mi) <T + 5\fiT2 < 6) 

A 

where the fourth line follows because ^T2 — )-p and by Lemma [5l Because e > was arbitrary 
we have limsupP(T(Mi) < t(M2)) < A/ {I + A). On the other hand, 

A 

liminf P(T(Afi) < t(M2)) > liminf P({r(Mi) < T]) 



l + A 

Let Z he a, random variable such that Z = 1 if r(Mi) < r(M2) and Z = otherwise. If 
t(Mi) < r(M2) then a{H2) = cr(Mi) and p{H2) = p{Mi). By Proposition [7] we know that cr(Mi) 
converges in distribution U and p{Mi) converges in probability to 1. The event r(Mi) = r(M2) 
has probability 0. If r(Mi) > r(M2) then a{H2) = a{M2) and p{H2) = p{M2). By definition of 
model M2 we always have a{M2) = and by Proposition [T3]/j(M2) converges in probability to 
a A 1. Therefore, 

a{H2) = a{Mi)Z + a(M2)(l - Z) ^a U^ 

and 

p{H2) = p{Mi)Z + p(M2)(l - Z) ^d e + (a A 1)(1 - 0- 

n 
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Let M be the set of Radon measures i/ on a Polish space {^,B) where B is the Borel cr- field 
such that ^{{x}) G N U {0, 00} for all x £ ^. For the next proof we will consider a point process 
to be a random variable taking on elements of A/". We consider ^{{x}) to be the number of times 
the point x has been marked. For a Poisson point process whose intensity measure has no atoms 
^{{x}) is or 1 for all x and {x £ ^ : iy{{x}) > 0} is discrete with probability 1. 

Let ^ = [0,00) X [0, 1]. The Poisson point process of successful type-1 mutations in model 
Ml induces a point process on ^ where if a successful type-1 mutation occurs at time t on a cell 
in generation i in model Mi then there is a point of ^ at {t/l, i/l). We will call this point process 
Pm. 

Lemma 23. The limiting distribution of Pm is a Poisson point process Poo which has intensity 
measure v' = ^^(A x A[i/2,i]) where A is Lebesgue measure and A[i/2,i] is the measure defined by 
\i/2,i]{B) = -^(^ n [1/2, 1]) for any Lebesgue measurable set B. 

Proof. We let Cci"^ ■, [—1, 0]) be the set of continuous functions /i : ^ — )• [—1, 0] such that the set 
{^ G ^ : /i(V') 7^ 0} is precompact. Recall that a point process X has an associated generating 
functional 5^ : Cc{^ , [-1,0]) -^ M defined by 

uh) = E[\{{h{i,) + ir^^^] 

where z^ is a Radon measure on ^ as described above. Probability generating functionals uniquely 
determine the distribution of point processes (see Theorem 14 of section 29.5 in |4]). Moreover, 
a sequence of point processes converges in distribution to a point process if and only if the corre- 
sponding sequence of generating functionals converges pointwise to a functional ^ that satisfies 
the following: If km is in the domain of '$ for each m, IJ^^j^jV' : hm{tp) 7^ 0} is relatively com- 
pact, and hm{tp) — )• as m — )• 00 for each ip, then ^{hm) — )• 1 as m — )• 00. In this case 5 is the 
probability generating functional of the limiting point process (see Theorem 20 of Section 29.7 

in a). 

Notice that for any A^ the points marked in ^ will all have coordinates (x, y) where y takes 
values in {1/log A^, 2/log A'^, . . . , 1}. We know from the proof of Corollary [11] that the rate 
at which mutations occur along generation i is bounded between 2*~^/i(l — e"'^'^ ' ~'^') and 
2*^-'^^(l — e'^^"^ ' ~ )■ Therefore, if we look at the points that are marked in ^ whose sec- 
ond coordinate is fixed at i/ log N , the rate at which the marking will occur will be between 
(logiV)2^-V(l - e-^(2'-'+i-2)) ^^^ (logiV)2*-V(l - e-'^^^'-'+i-i)) ^Yigxq the logiV appears be- 
cause time is scaled by 1/ log N . This observation will allow us to work with time homogeneous 
Poisson point processes. 

Let ^ denote the generating functional associated with Pm- Let 5^i be the generating func- 
tional associated with the Poisson process on ^ which marks points at rate (log A^)2*^^/i(l — 
g-M(2 -2) -J Qj^ each generation and let ^2 be the generating functional associated with the 
Poisson process on ^ which marks points at rate (log A^)2*~^;u(l — e~^^'^ ~^') along each gen- 
eration. Call the time homogeneous Poisson point processes Pi and P2 respectively. Because 
the intensity measure of Pm is always between the intensity measures of Pi and P2 we have the 
bounds 5"! < 5" < 5^2- 

Let X be a Poisson process with intensity measure u. It is known that the probability 
generating functional associated with X is 
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To show a sequence of Poisson processes {X„}^q with intensity measures {i^nj^o converges in 
distribution to a Poisson process X with intensity measure z/ it is enough to show that {j/„}^q 
converges weakly to u. That is, for each h G Cc('I', [—1,0]) we need f^ hdvn — ?• /^ hdv as n — )• oo. 
Let v^ be the intensity measure of Pi when there are A^ cells in the population and let v'jq be 
the intensity measure of P2 when there are A^ cells in the population. The goal is to show v^ 
and v'j^ both converge weakly to u' . Then the limiting distribution of Pm will be Poo- 
Let R = (a, h] x (c, d] C ^. Then 



z.^(P) = (a-6)(logiV) Y. '^^Ji{l-e-^'^^'"^'-^^)^A\d-cy\)+{h-a) = u'{R) 

i£(/c,M] 



2' 

by Lemma [TU] and the assumption that pL ~ A/y/N\ogN which implies y?N\ogN ~ A^ /\ogN. 
Now let O be any open subset of ^ . We can write O = UJ^i ^n where each P„ is a half open 
rectangle in the same form as R above and the sets {Rn]'^=i are pairwise disjoint. Then 

00 00 

liminf 4(0) = hminf ^z.^(P,) >Y,y'{R,) = u'{0) 

N^oo JV— s>oo ^ — ' ■^ — ' 

where the inequality follows by Fatou's lemma. By the same reasoning liminf 1/^(0) > ^'{O) for 
any open subset O of ^. It follows by the Portmanteau Theorem that both u^ and v'j^ converge 
weakly to v' as A^ goes to infinity. Because of the bounds on the linear functionals we have that 
the limiting distribution of Pm is a Poisson process with intensity v' . D 

The notation used in Lemma [23] will also be used in this proof. 

Proof of part 4 of Proposition {^ Notice that this is the boundary between two cases that are 
determined by model Mi. By Corollary [9] we know p{Mi) — T-p 1 for all conditions that we are 
considering. Therefore, p{Hi) — )-p 1 in this case. 

The strategy is to define functions g and h on the set of Radon measures that are continuous 
everywhere except a set of measure 0. Then we will apply the Continuous Mapping Theorem 
to get the desired convergence in distribution. Let D be the subset of N such that u ^ D \i 
there exists (x,y) G ^ and t G M such that v{x,y) > and v^x + t,y + t) > 0. For all t > 
define sets Tj = {{x,y) : < y < 1/2 and 0<x<y + t — 1} C^. These sets correspond 
the the triangles and quadrilaterals that were shown in the picture in the introduction. Let 
V = {{x, y) G"^ : iy{x, y) > 0} and define to = inf{t ■.Vr^Tt^0}. Define 

g{i') = lim sup{y : (x, y) G ^ n Tto+e &!■ some x} 

e^O y 

and h{v) = to- 
Given a Poisson point process P on ^ whose intensity has no atoms, we can project the points 
of P onto the line y = —x in R along perpendicular angles of 7r/4. With probability 1 no two 
points of P will be mapped to the same point under the projection. That is, under the law of 
P, D has probability 0. Moreover, with probability 1 there will be no limit points under the 
projection. Therefore, under the intensity measure ^^(A[i/2,i] x A), there exists a unique point 
(xo,?/o) G FnTtQ and an e > such that T/nTjp+e = {(a^Oiyo)} with probability 1. By definition 
g{P) = yo- We claim that g and h are continuous at any Radon measure v G J\f\D. 
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Let V G N\D and let {w„}^]^ be a sequence of Radon measures that converges weakly to v. 
Let e > and let (xcyo) be the unique point of Tjo+e such that z^(xo,yo) > 0. For each point 
{x\y') G ^ and every natural number m define a function 

r -1 ii\{x,y)-{x',y')\ <e/m 

f{x',y'),m{x,y) = I -{2-m\{x,y) - {x',y')\/e) lie/m < \{x,y) - {x',y')\ < 2e/m 
[_ otherwise 

For m large enough we have fi^, f(xo,yo),m{x,y)du = -1 so J^, f{xo,yo),m{x,y)diyn -^ -1 as n ^ 
oo for large enough values of m. Because we can make m arbitrarily large, there must be a 
sequence of points {{xn,yn)}^=i such that i'n{xn,yn) = —1 for all n and {xn,yn) — ^ {xo,yo) 
as n — )• oo. Likewise, for any point {x',y') G ^io+e there exists a large enough m such that 
J\i, f{x',y'),m.{x,y)di^ = so j^ f(x\y'),m{x-,v)dvn — )• as n — )• OO. This shows that for n large 
enough the Radon measures Vn will assign measure to all points in a ball of radius e/m about 
{x\y'). From this it is easy to conclude g{i'n) — ^ dii^) and h{vn) -^ h{v). Therefore, g and 
h are both continuous on J\f\D. By Lemma [23] and the Continuous Mapping Theorem g(PM) 
converges in distribution to g{Poo) and h{P]\f) converges in distribution to /i(Poo)- 

The next goal is to show that g{PM) - o"(Afi) -^p and h{PM) - t(Mi)/ log N -^p 0. Then we 
will have that (j(Mi) — j-^ g{Poo) and T(Mi)/log A^ — >-rf /i(Poo)- To achieve this we will first show 
that the probability that {xQ,yo) corresponds to the cancer causing type-1 mutation converges in 
probability to 1. Suppose (xq, yo) does not correspond to the cancer causing type-1 mutation and 
let (xi,yi) denote the point in ^ corresponding to the cancer causing type-1 mutation in Mi. 
Let e > and suppose that (xi,yi) ^ Tt^j^^. The point (xo,yo) ^ ^io corresponds to a successful 
type-1 mutation in model Mi, and by the way that model Mi marks points in ^ there will be 
a type-2 mutation in model Mi that corresponds to a point in Tt^. The ray starting at (xi,yi) 
with an angle of 7r/4 will represent all of the descendants of the cancer causing type-1 mutation. 
The point on this line whose first coordinate is to will be {to,y") where y" < 1 — e. In this case 
p{Mi) = y" < I — e. Let Ei be the event that (xo,?/o) is the point in ^ that corresponds to the 
cancer causing type-1 mutation and E2 be the event that two or more points occur in Ttg+g. We 
know that Pm converges in distribution to Poo by Lemma [23] so 

limsupP(Ef ) = limsup(P(£;f n {{xi,yi) G Tt,+,}) + P{E^ D {{xi,yi) ^ Tt,+,})) 

< limsupP(£'2) + limsupP(p(Mi) < 1 - e) 

< — e 
- 2 

where the last line follows because p{Mi) — s-p and P{E2) < PiV n (Ti(,_|_e\Tt„) 7^ 0). Because 
e > was chosen arbitrarily, we have lim P{E^') = 0. 

The above has established that limP(£'i) = 1. By definition of a{Mi) and g{PM) it is clear 
that 

P{a{Mi) - g{PM) = 0|^i) = 1 

because (t{Mi) = g{Pnj) = yQ. Conditional on the event Ei we also know that the cancer causing 
type-1 mutation occurs at time (logA^)3;o. Let (xo,yo) be the point in ^ that corresponds to 
the type-2 mutation in Mi, so that p{Mi) = y'^. Let u be the Radon measure of points in ^ 
induced by Mi and consider the fact that the descendants of the cancer causing type-1 mutation 
will lie on a line starting at (xo,yo) with angle 7r/4. It is clear that h{v) = tQ = XQ + l — yQ and 
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p(Mi) = 2/0 + r(Mi)/log(iV) - xq. Thus, if h{v) - r(Mi)/logiV > e then 1 - p{Mi) > e, or 
equivalently p{Mi) < 1 — e. Therefore, because P{Ei) — )• 1, 

P{h{PM) - T{Mi)/logN > e\Ei) = P{p{Mi) < 1 - e\Ei) -^ 0. 

Again using the fact that P{Ei) — )• 1 we get the desired result. 

Now we are left to show that g{Poo) and h{Poo) have the distributions that are stated in part 
4 of Proposition [2j We have P(/i(Poo) < t) is the probability that a point of the Poisson process 
with intensity ^■^(A[i/2.i] x A) has been marked in Tf. For t < 1/2 this is 1 — e~ * '^ and for 
t > 1/2 this is 1 - e-^'*/2+^V8. Therefore, 

P(r(Mi)/logiV < t) ^ (1 - e-^'*'/2)lfo,i/2](t) + (1 - e-^'*/2+-^'/«)l(i/2,oo)(t). 

To find the distribution of g{Poo) we will use the joint density function of g{Poo) and h{Poo)- 
Prom the above computation it is clear that the density of /i(-Poo) is 

Conditioned on the event that h{Poo) = t we know that g{Poo) will have uniform distribution. If 
t < 1/2 then g{Poo) is uniformly distributed on the interval [1 — t, 1]. If t > 1/2 then g{Poo) is 
uniformly distributed on [1/2, 1]. This gives us the conditional density function 

Ji ifl-t<s<landO<t<i 
h\h[s\t) - I 2 if i < s < 1 and t > i 

Therefore, the joint density function of g{Poo) and /i(-Poo) is 

f{s,t) = A2e-^'*'/2l[o,i/2](t)l[i-,i/2](«) +^'e-^'*/2+^V8i^^/^_^^(i)l^^^^^^j(,). 

Integrating over t we find that the density of g{Poo) is 

This gives the desired limiting distribution for model Mi. By the same arguments as used 
above, the results will hold for model Hi as well. D 

Proof of part 6 of Proposition \M First we change model Mi so that only generation / — 1 will 
get type-1 mutations and generation I will get type-2 mutations. Also, assume that only one of 
the daughters will keep a mutation when the cells split so that if a type-1 cell splits it has a 
type-0 daughter and a type-1 daughter. The rate at which the type-1 mutations occur will be 
lj,N/4: since there are N/4 cells in generation I — 1. Note that p,N/4 ~ AviV/4. The probability 
that a type-1 mutation will have a type-2 descendant is 1 — e^* ^ pt ^ At/yN . Therefore, 
the type-2 mutations occur according to a Poisson process whose intensity measure u satisfies 
i^([0,t]) > {A\fN /4){At/y/N) = A^t/4:. We have may have to wait up to two time units for the 
type-2 mutation to occur after the successful type-1 appears. For the sake of a lower bound we will 
always assume it takes 2 time units after a successful type-1 mutation until the type-2 mutation. 
By coupling this model with model H2, this gets us liminf P(r(Mi) < t) > 1 — e~^~ *'^. 
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For the upper bound we change model Mi so that the type-1 cells never undergo apoptosis. 
There are N cells getting type-1 mutations so the type-1 mutations occur at rate fiN ~ A\/N. If 
we wait t time until after a type-1 mutation has occurred the cell will have at most 2* descendants. 
If the type-1 mutation had occurred at time and all of the descendants had existed since 
the type-1 mutation occurred then the probability that one of the cells had acquired a type-2 
mutation would be t2L*J/i < t2*^ ~ t2''A/\/N. Because the type-1 mutation may occur after 
time and there have not been 2* descendants with the type-1 mutation since the mutation 
occurred this is an upper bound on the probability that a type-2 mutation has occurred by 
time t. Therefore, the type-2 mutations occur according to a Poisson process with intensity 
u([0,t]) < {Ay/N){t2^A/jN) = t2*A2. Then limsupP(r(Mi) < t) < 1 - e'^^'^^'K This shows 
part 6 of Proposition [2] with c = 1 — e~'^~^ */^ and C = \ — e^^ ^ *. 

By Corollary [9] we know p(Mi) — t- 1. By the definitions of a{Mi) and p{Mi) for any e > if 
p{Mi) - a{Mi) > e then r(Mi) > elogiV. Therefore, 

P{p{Mi) - a{Mi) > e) < P(r(Mi) > elogiV) < f,-A^2^'°<^''(5\ogN) ^ ^ 

Let e > and 5 > and choose N large enough so that P(l — p{Mi) > e/2) < 6/2 and 
P{p{Mi) - a{M2) > e/2) < 6/2. Then 

P(l - a(Mi) > e) = P(l - p(Mi) + p(Mi) - (j(Mi) > e) 

< P(l - piMi) > e/2) + P(p(Mi) - a(Mi) > e/2) 
<6. 

Therefore, cr{Mi) — )-p 1. 

Using the same techniques as in the previous sections, we get the same results for Hi. D 
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